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Introduction. — In [T] the authors claim to observe a periodic signal in measurements 
of Newton’s gravitational constant, G. Specifically they find a 5.9 year period signal that is 
strongly correlated with variations in the observed length of day [2]. They do not suggest 
that G actually varies on these time-scales, but rather that there could be some systematic 
effect on the measurement process that is correlated with the mechanism that leads to 
the variation in the length of day. Here I present a reanalysis of the data used in [T] 
using Bayesian model selection to test the hypothesis that the data contains a periodic 
signal compared to other potential models. In light of updated information on the times of 
the various G measurements given in [3] I also reanalyse this new dataset with the same 
method. In both datasets I have found that a model for the variations in G that only 
contains an additional Gaussian noise term is hugely favoured, by factors of > over 
models containing a sinusoid term(3 

Analysis method. — Bayesian model selection provides a natural way to test multiple 
hypotheses by forming the odds ratio of evidences for the different hypotheses. The odds 
ratio for two hypotheses Hi and Hj is given by 

O., = {p{d\H,,I)/p{d\H,,I)) X {pmi))/{p{H,\I)) (1) 

where p{d\Hi,I) is the evidence for hypothesis Hi given some data d, p{Hi\I) is the prior 
probability for Hi, and / is information concerning any other assumptions. When comparing 
hypotheses I assume that they are equally probable a priori, so the prior ratio is unity. 
Therefore, I just calculate the ratio of evidences for each hypothesis. If a given hypothesis 
is defined by a set of parameters, 6i, with their own priors, p{6i\Hi, I), then to calculate the 
evidence the parameters must be marginalised over, e.g. 


p{d\H„I)= J ' p{d\9,,H„I)p{9,\Hi,I)d9„ (2) 

where p{d\9i, Hi, I) is the likelihood function of the data given a set of model parameters 

9 ^. 

^The code, data tables, figures and prior ranges for this analysis can be found at 
https://github.com/mattpitkin/periodicG 
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Table 1: Log odds ratios for the four hypotheses (i represents rows and j represents columns) when 
using the data used in [T], with those when using the data from [3] in parentheses. 


In Oij 

H 2 

Ho 

Hi 

Hi 

-133 (-140) 

-102.2 (-66) 

-103 (-110) 

H 2 


30 (74) 

30 (30) 

Ho 



-0.3 (-45) 


The general model that I use for my hypotheses is 

m{fj.G,A,P,4>o,Tk) = Asin((/)o + 27r(Tfe -to)/P) + fJ-G, (3) 

where /re is an offset value, A is the sinusoid amplitude, (j)o is an initial phase at an epoch 
to, P is the sinusoid period, and is the time. 

In this analysis I have compared four different hypotheses. Hi, to explain the measure¬ 
ments of G: Hi) the data is consistent with Gaussian errors, given by the experimental error 
bars cTg k, about an unknown fiG', H 2 ) as for Hi, but also including an unknown common 
Gaussian noise term (Tsys! H^) as for Hi, but also including a sinusoid with unknown A, (po 
and P; and, H^) as for H^, but also including an unknown Usys- These each correspond to 
a different set of parameters required in 0 and also the number of parameters required in 
the integral of eq. [2] 

For an initial examination of the claim in [1] I have used their Figure 1 to read-off the 
experimental times and then used Table XVII of [1] for values of gH. 

In [T] the experiment times are given no associated error. However, many of the times 
used correspond to the received date of the respective paper rather than the date of the 
actual experiment. In analysing this data I specified uncertainties on the experiment times 
of at = 0.25 years (with the exception of the JILA-10 and LENS-14 measurements for which 
I use uncertainties of one week) before the given time. I have taken this time uncertainty 
into account by marginalising over it for each data point. 

Results. — The odds ratios comparing hypotheses when using the G dataset of [T] are 
summarised in table [1] It is clear that hypotheses including extra parameters over that for 
Hi are hugely favoured by factors of > The two hypotheses, H^ and H^, containing 
a sinusoidal signal are both approximately equally probable. However, H 2 , just containing 
the additional unknown noise term Csys and the unknown offset ^g, is hugely favoured by 
factors ^ over H^ and H^. This shows that the simple model for which variations are just 
due to an unknown Gaussian noise term is far more likely to be the cause of the variations 
than an additional sinusoidal variation. This is due to Bayesian model selection naturally 
applying a penalty for including additional parameters that do not significantly increase the 
evidence. 

I have also looked at the posterior probability distributions for the period and for H^ 
I see a clear lone spike in probability around the claimed period of 5.9 years. A similar 
spike shows up for H^i, but is much less pronounced. I have assessed the significance of this 
period probability peak for H^ by rerunning that analysis 20 times, but each time randomly 
shuffling the G values to remove any real periodicity in the data. Out of these 20 runs there 
is one time when the hypothesis using the shuffled data is more favoured than when using 
the un-shuffled data and another couple that are within a factor of two. The posteriors for 
these cases also show very similar spikes in the period to that from the unshuffled data. 

Since the acceptance of [1] Schlamminger et al. [3] examined the claim, in particular 
noting that the experimental times in the original work are not accurate. They examined 

^For the BIPM-13 measurements I used the combined servo and Cavendish value from [5] and for the 
LENS-14 measurements I used the values from [6]. 
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the literature to compile a more complete list of experiments with information on the actual 
dates on which the experiments were performed. I have reanalysed this new dataset for each 
of the four hypotheses. When marginalising over the time error I have now set the error to 
be symmetric around the mean experiment times. For all other parameters I have used the 
same prior ranges as in the initial analysis. The odds ratios for each of these cases are also 
given in table [1] from which it can be seen that H 2 is still favoured over all other hypotheses 
by a huge amount. However, i/a is now hugely disfavoured over H^, i.e. just including a 
sinusoid, but adding no additional noise term does far worse at fitting the data than also 
including the noise term. 

Conclusions. — I have reanalysed the data consisting of measurements of G from [1] 
and [3] to asses the claim of a periodic component with a period of 5.9 years. 

Using Bayesian model selection, and four different hypotheses to describe the variations 
in the data, and including uncertainties on the experimental times, I have found that the 
best model is one in which there is an additional unknown Gaussian noise term on top of 
the observed experimental errors. This is favoured over a model also containing a sinusoidal 
term by factors of > I also find that periodic signals can easily be found in random 
permutations of the data suggesting that the observed periodicity seen in [I] is just a random 
artifact of the data. 

Following the publication of [I] the authors have taken into account the work of [3] (see 
H)- They fit an additional sinusoid to the updated data and note that the significance of 
the correlation with the length of day decreases. I expect that calculating the evidence for a 
model including two sinusoids would not cause such a model to be favoured over the simpler 
model containing just the extra Gaussian noise term, as the increase in parameter space will 
be penalised if the fit does not significantly improve. 

I note that if there were good a priori reasons to expect a periodic component with a 
specific period in the data (i.e. if there were a good reason why the mechanism leading to 
changes in the length of day could couple into measurements of G), then the evidence for 
models containing such a periodic signal might dramatically increase. However, without 
such prior knowledge using such a constraint would strongly bias us. 

* * * 

I would like to thank Prof. J. Faller for useful discussions, Prof. C. Speake for putting me 
in contact with Dr. S. Schlamminger, and Dr. Schlamminger for providing me with a data file 
of their compiled G measurements. I am funded by the STFC under grant ST/L000946/1. 

Additional remark: In responding to my Comment [5] the authors of the original article 
put forward two related pieces of evidence to suggest that models for the time variations 
of G measurements containing a constant offset and one or two periodic components are 
favoured over a model with no periodic component. They show that once the best fits 
for the models containing one or two periodic components are removed from the data the 
residuals have smaller variance, and have a distribution that is closer to a Gaussian distri¬ 
bution, than for the model containing no periodic component. These findings are not at all 
surprising. If one adds more parameters to the model, then one can almost always find a 
better fit with smaller residuals - in cases where the additional model complexity adds no 
extra information this is commonly known as over-fitting and is related to the concept of 
Occam’s razor. A Bayesian analysis, such as I performed, naturally allows one to penalise 
extra complexity when it is unnecessary through the incorporation of an Occam factor. This 
Occam factor penalty comes about through use of the prior volume of the parameter space 
for each model. Each parameter has a range over which it is a priori thought to exist, and 
in my analysis the prior volume is just the product of these ranges. So, a model with more 
parameters will often have a larger prior volume. This naturally incorporates a penalty for 
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over-fitting in that larger prior volumes will down-weight the evidence for a model, so if the 
likelihood does not compensate enough for this down-weighting then the evidence will be 
reduced. 

In my analysis this penalty far outweighs the slightly better ht that can be achieved with 
a more complex model containing a sinusoid. It should also be noted that my most favoured 
model does not just contain a constant offset, but also contains an additional unknown noise 
term. This may suggest that the quoted errors on the G measurements are underestimates 
of the true noise. 

A further addition that I was able to include in my analysis, but which the authors of 
the original article do not appear to have addressed is that there are also error bars on the 
experimental times. This may also weaken their fit. 

Further G measurement data may change my conclusions, but with the current data I 
stand by my result that the data is best explained by just a constant offset and an additional 
unknown noise term, and no periodic component is required. 
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